Search CORE

5 research outputs found

Performance engineering for HEVC transform and quantization kernel on GPUs

Author: Alen Duspara
Igor Piljić
Leon Dragić
Mario Kovač
Mate Čobrnić
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

Continuous growth of video traffic and video services, especially in the field of high resolution and high-quality video content, places heavy demands on video coding and its implementations. High Efficiency Video Coding (HEVC) standard doubles the compression efficiency of its predecessor H.264/AVC at the cost of high computational complexity. To address those computing issues high-performance video processing takes advantage of heterogeneous multiprocessor platforms. In this paper, we present a highly performance-optimized HEVC transform and quantization kernel with all-zero-block (AZB) identification designed for execution on a Graphics Processor Unit (GPU). Performance optimization strategy involved all three aspects of parallel design, exposing as much of the application’s intrinsic parallelism as possible, exploitation of high throughput memory and efficient instruction usage. It combines efficient mapping of transform blocks to thread-blocks and efficient vectorized access patterns to shared memory for all transform sizes supported in the standard. Two different GPUs of the same architecture were used to evaluate proposed implementation. Achieved processing times are 6.03 and 23.94 ms for DCI 4K and 8K Full Format, respectively. Speedup factors compared to CPU, cuBLAS and AVX2 implementations are up to 80, 19 and 4 times respectively. Proposed implementation outperforms previous work 1.22 times

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Exploring manycore architectures for next-generation HPC systems through the MANGO approach

[EN] The Horizon 2020 MANGO project aims at exploring deeply heterogeneous accelerators for use in High-Performance Computing systems running multiple applications with different Quality of Service (QoS) levels. The main goal of the project is to exploit customization to adapt computing resources to reach the desired QoS. For this purpose, it explores different but interrelated mechanisms across the architecture and system software. In particular, in this paper we focus on the runtime resource management, the thermal management, and support provided for parallel programming, as well as introducing three applications on which the project foreground will be validated.This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671668.Flich Cardo, J.; Agosta, G.; Ampletzer, P.; Atienza-Alonso, D.; Brandolese, C.; Cappe, E.; Cilardo, A.... (2018). Exploring manycore architectures for next-generation HPC systems through the MANGO approach. Microprocessors and Microsystems. 61:154-170. https://doi.org/10.1016/j.micpro.2018.05.011S1541706

Infoscience - École polytechnique fédérale de Lausanne

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

RiuNet

Implementation and optimization of FRISC based system

Author: Duspara Alen
Publication venue: University of Zagreb. Faculty of Electrical Engineering and Computing.
Publication date: 01/07/2015
Field of study

U ovom radu proučava se logika sustava procesora FRISC. Razmatraju se karakteristike ovog procesora te se opisuje postojeća verzija dizajna izvedenog na FPGA sklopu. Na osnovu opisa postojećeg dizajna razrađuju se ideje za daljnji razvoj i optimizaciju s naglaskom na protočnoj strukturi upravljačke jedinice. Predočavaju se nedostaci sustava koji ograničavaju mogućnosti njegova rada i prikazuje se tok izvedbe novog dizajna. U sustav procesora FRISC se dodaje nova vanjska jedinica koja predstavlja sučelje s LCD zaslonom te se prikazuje izvedba upravljačkog programa u asemblerskom jeziku FRISC procesora koji omogućava crtanje osnovnih grafičkih primitiva i znakova.This paper examines the logic of FRISC based system. Main characteristics of this processor are being studied by describing the existing design made on FPGA device. The paper elaborates the ideas for further developments and optimization focusing on the structure of pipeline in control unit. The disadvantages that limit the possibilities of system are being presented. Paper shows workflow through designing a new system and adding an interface to external LCD module. Also, the implementation of driver program for drawing graphical primitives and characters on the LCD screen written in FRISC assembly language is described

University of Zagreb Repository

Croatian Digital Thesis Repository

FER Repository

Highly parallel GPU accelerator for HEVC transform and quantization

Author: Dragić Leon
Duspara Alen
Kovač Mario
Piljić Igor
Čobrnić Mate
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 10/11/2020
Field of study

When analysing Internet traffic today it can be found that digital video content prevails. Its domination will continue to grow in the upcoming years and reach 82% of all traffic by 2021. If converted to Internet video minutes per second, this equals about one million video minutes per second. Providing and supporting improved compression capability is therefore expected from video processing devices. This will relieve the pressure on storage systems and communication networks while creating preconditions for further development of video services. Transform and quantization is one of the most compute-intensive parts of modern hybrid video coding systems where coding algorithm itself is commonly standardized. High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard which achieves high compression efficiency at the cost of high computational complexity. In this paper we present highly parallel GPU accelerator for HEVC transform and quantization which targets most common heterogeneous computing CPU+GPU system. The accelerator is implemented using CUDA programming model. All the relevant state-of-the-art techniques related to kernel vectorization, shared memory optimization and overlapping data transfers with computation were investigated, customized and carefully combined to obtain a performance efficient solution across all applicable transform sizes. The proposed solution is compared against reference implementation which uses NVIDIA cuBLAS library to perform the same work. Obtained speedup factors for DCI 4K frame are 2.46 times for largest transform size and 130.17 times for smallest transform size what revealed substantial performance gap of this library when targeting GPU of the Kepler architecture. Achieved processing time of frame transform and quantization are up to 4.82 ms

Crossref

University of Zagreb Repository

FER Repository

MANGO: Exploring Manycore Architectures for Next-GeneratiOn HPC Systems

Author: Agosta Giovanni
Ampletzer Philipp
Brandolese Carlo
Cappe Etienne
Cilardo Alessandro
David Atienza Alonso
Dragic Leon
Dray Alexandre
Duspara Alen
Flich José
Fornaciari William
Guillaume Gerald
Hoornenborg Ynse
Iranfar Arman
Kovac Mario
Libutti Simone
Maitre Bruno
Martinez José Maria
Massari Giuseppe
Mlinaric Hrvoje
Papastefanakis Ermis
Picornell Tomas
Piljic Igor
Pupykina Anna
Reghenzani Federico
Staub Isabelle
Tornero Rafael
Zapater Marina
Zoni Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The Horizon 2020 MANGO project aims at exploring deeply heterogeneous accelerators for use in High-Performance Computing systems running multiple applications with different Quality of Service (QoS) levels. The main goal of the project is to exploit customization to adapt computing resources to reach the desired QoS. For this purpose, it explores different but interrelated mechanisms across the architecture and system software. In particular, in this paper we focus on the runtime resource management, the thermal management, and support provided for parallel programming, as well as introducing three applications on which the project foreground will be validated

Infoscience - École polytechnique fédérale de Lausanne

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref